Grammar Inference and Statistical Machine Translation
نویسندگان
چکیده
NLP researchers face a dilemma: on one side, it is unarguably accepted that languages have internal structure rather than strings of words. On the other side, they nd it very di cult and expensive to write grammars that have good coverage of language structures. Statistical machine translation tries to cope with this problem by ignoring language structures and using a statistical models to depict the translation process. Most of the translation models are word-based. While the approach has achieved surprisingly good performance comparable to the best commercial systems, many questions remain in the machine translation community. Can the statistical word-based translation still perform well on language pairs with radically di erent linguistic structures? How would it function with less training data or with spoken languages? The thesis work investigated these questions. In summary, word-based alignment model is a major cause of errors in German-English statistical spoken language translation. To account for this problem, a structure-based alignment model is introduced. This new model takes advantages of a bilingual grammar inference algorithm, which can automatically acquire shallow phrase structures used by the model. The structure-based model can directly depict the structure di erence between English and German spoken languages. It also results in focused learning of word alignment, therefore it can alleviate the sparse data problem. The structurebased model achieved 11 percent error reduction over the state-of-the-art statistical machine translation models.
منابع مشابه
A Bayesian Model for Learning SCFGs with Discontiguous Rules
We describe a nonparametric model and corresponding inference algorithm for learning Synchronous Context Free Grammar derivations for parallel text. The model employs a Pitman-Yor Process prior which uses a novel base distribution over synchronous grammar rules. Through both synthetic grammar induction and statistical machine translation experiments, we show that our model learns complex transl...
متن کاملGrammatical Inference for Syntax-Based Statistical Machine Translation
In this article we present a syntax-based translation system, called TABL (Translation using Alignment-Based Learning). It translates natural language sentences by mapping grammar rules (which are induced by the Alignment-Based Learning grammatical inference framework) of the source language to those of the target language. By parsing a sentence in the source language, the grammar rules in the ...
متن کاملFinite-state transducer inference for a spee machine translation
Statistical techniques and grammatical inference have been used for dealing with automatic speech recognition with success, and can also be used for speech-to-speech machine translation. In this paper, new advances on a method for finite-state transducer inference are presented. This method has been tested experimentally in a speech-input translation task using a recognizer that allows a flexib...
متن کاملFinite-state transducer inference for a speech-input Portuguese-to-English machine translation system
Statistical techniques and grammatical inference have been used for dealing with automatic speech recognition with success, and can also be used for speech-to-speech machine translation. In this paper, new advances on a method for finite-state transducer inference are presented. This method has been tested experimentally in a speech-input translation task using a recognizer that allows a flexib...
متن کاملGREAT: A Finite-State Machine Translation Toolkit Implementing a Grammatical Inference Approach for Transducer Inference (GIATI)
GREAT is a finite-state toolkit which is devoted to Machine Translation and that learns structured models from bilingual data. The training procedure is based on grammatical inference techniques to obtain stochastic transducers that model both the structure of the languages and the relationship between them. The inference of grammars from natural language causes the models to become larger when...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998